-
Notifications
You must be signed in to change notification settings - Fork 56
feat(storage): implement milvus vector store #744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #744 +/- ##
===================================================
- Coverage 87.65179% 87.63076% -0.02104%
===================================================
Files 281 285 +4
Lines 36564 37472 +908
===================================================
+ Hits 32049 32837 +788
- Misses 2934 3001 +67
- Partials 1581 1634 +53
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4829443 to
28587e5
Compare
|
|
||
| func (vs *VectorStore) queryMetadataBatch(ctx context.Context, limit, offset int, ids []string, filter map[string]any) (map[string]vectorstore.DocumentMetadata, error) { | ||
| filterExpr := "" | ||
| if len(ids) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个条件是不是用 and 的关系也可以,现在是互斥的?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, 可以用and,改成用searchfilter重新构建语句了
| annReqs = append(annReqs, annReq) | ||
| } | ||
|
|
||
| hybridOption := client.NewHybridSearchOption(vs.option.collectionName, vs.getMaxResults(query.Limit), annReqs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
milvus 默认使用的RRFReranker ,这里返回的score会归一化吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看了下milvus请求的构建和文档,如果没有指定的话,应该没有使用RRFreranker
我另外加了reranker option,可以自定义配置
5e08ae2 to
196c8b0
Compare
| return "", nil | ||
| } | ||
|
|
||
| switch condition.Operator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的and 和 or条件构建函数是不是可以合并成一个函数, 以及 下面的 ==、!=、 >、>= 、< 、<= 这些也可以合并成一个函数。 这里的代码重复率太高了。可以参考:tcvector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| // milvusFilterConverter converts searchfilter conditions to Milvus expressions | ||
| type milvusFilterConverter struct{} | ||
|
|
||
| func (c *milvusFilterConverter) Convert(condition *searchfilter.UniversalFilterCondition) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里最好加一个recover 处理一下可能的panic。可以参考 tcvector的实现
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| return c.convertInCondition(condition) | ||
| case searchfilter.OperatorNotIn: | ||
| return c.convertNotInCondition(condition) | ||
| case searchfilter.OperatorLike: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是不是少了一个not like 的实现
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好像也还少了一个 searchfilter.OperatorBetween
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done,还有between
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要实现一下docBuilder, 方便支持用户自定义存储结构的转换处理。参考
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里其实已经实现了,不需要docBuilder,在创建存储collection和查询的时候都已经支持自定义存储结构
这里是指用户不是通过框架来创建的collection, 自己提前创建非标准的collection (这里主要是业务可能有自己独立的离线知识库处理流程,只用knowledge的retriever能力)。 所以需要一个WithDocBuilder, 让用户把非标的document存储结构转换成标准结构
可以看下docBuilder使用例子
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
理解了,加上docBuilder了,单元测试里也加上了
| ) | ||
|
|
||
| // Client is the interface for the milvus client. | ||
| type Client interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里看着都是按照v2的返回值定义的interface。 会有和v1有兼容性问题吗? 或者将来出v3是不是有可能接口兼容不了呢? 这里是不是应该定义和版本无关的接口。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里没有啥v1,之前的milvus-go-sdk已经废弃了,以后都会在milvus仓库维护了
这里主要是为了实现mock client,和milvus server本身没有啥兼容性问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里用过滤器模版是不是会好点
| conditions = append(conditions, fmt.Sprintf("%s in [%s]", vs.option.idField, strings.Join(idFilters, ", "))) | ||
| } | ||
|
|
||
| if len(filter.Metadata) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter.Metadata 以及 filter.IDs的条件构建是不是可以统一转换到filter.FilterCondition中,交给Converter统一来处理
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| return maxResults | ||
| } | ||
|
|
||
| func (vs *VectorStore) buildMetadataFilter(filter map[string]any) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的格式转换建议统一放到Converter中来处理。filter.FilterCondition中可能也会存在metadata.xx的条件
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经没有buildMetadataFilter了,全部交由Converter来处理
|
|
||
| func (c *milvusFilterConverter) convertCondition(condition *searchfilter.UniversalFilterCondition) (string, error) { | ||
| if condition == nil { | ||
| return "", nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
考虑与其他db的对齐一下,return error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| func (c *milvusFilterConverter) convertAndCondition(condition *searchfilter.UniversalFilterCondition) (string, error) { | ||
| if condition.Value == nil { | ||
| return "", nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不符合预期的条件或者value,建议都返回error。尽量别把error隐藏了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
196c8b0 to
137f586
Compare
| } | ||
| }() | ||
|
|
||
| if condition == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个条件判断貌似可以去掉, convertCondition方法中已经做了统一判断
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不是搞反了,应该保留convertCondition中的判断, 去掉Convert中的判断。 因为AND OR逻辑条件时会递归调用convertCondition,所以这里统一在convertCondition做判断处理会完整一点
137f586 to
9ba3bfe
Compare
| // - String values must be enclosed in quotes | ||
| // - Numeric values (int, float) should not be quoted | ||
| // - Boolean values should not be quoted | ||
| func (c *milvusFilterConverter) formatValue(value interface{}) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.Time类型需要做特殊处理吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.Time类型现在当int64来处理了
| // - String values must be enclosed in quotes | ||
| // - Numeric values (int, float) should not be quoted | ||
| // - Boolean values should not be quoted | ||
| func (c *milvusFilterConverter) formatValue(value interface{}) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
milvus文档 这里使用过滤器模版是不是在性能和value兼容性上会好点?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对的,用过滤器模版会好点,这个我改下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete函数不支持filter_params,所以delete还是用原始语句
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, 删除操作目前只支持根据id来删, 还好。
| return nil, fmt.Errorf("create milvus client from instance failed: %w", err) | ||
| } | ||
| } else { | ||
| if option.address == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议这里只构建builderOpts,client的创建逻辑统一由ClientBuilder来处理。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| } | ||
|
|
||
| // Update document using upsert | ||
| upsertOption := client.NewColumnBasedInsertOption(vs.option.collectionName).WithVarcharColumn(vs.option.idField, []string{doc.ID}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
覆盖模式下的更新需要带上全部字段吧,貌似只有开启了nullable的字段才可以省略?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9ba3bfe to
5a43515
Compare
| for key, val := range cr.params { | ||
| queryOption.WithTemplateParam(key, val) | ||
| } | ||
| queryOption.WithOutputFields(vs.option.allFields...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个WithOutputFields是必须要设置的吗?有没有默认输出所有字段的设置,不然这里可能会有一个问题:如果是业务自定义的collection,非标字段是不是都拿不到了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, WithOutputFields([]string{""}...) “”代表所有字段
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* 代表所有字段
| } | ||
| queryOption.WithOutputFields([]string{ | ||
| vs.option.idField, | ||
| vs.option.metadataField, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也是一样,自定义collection不一定有metadata字段。 有可能是需要在自定义docBuilder中构建的metadata
5a43515 to
90e0164
Compare
| if vs.option.docBuilder != nil { | ||
| for i := 0; i < resultLen; i++ { | ||
| row := make([]column.Column, len(vs.option.allFields)) | ||
| for j, field := range vs.option.allFields { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也需要改一下,按照vs.option.allFields来取字段不合适了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, 直接以milvus server返回的字段为准
Signed-off-by: joey <[email protected]>
90e0164 to
994cf61
Compare


No description provided.